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Abstract 


Oblivious RAM (ORAM) is a machinery that protects any RAM from leaking information 
about its secret input by observing only the access pattern. It is known that every ORAM must 
incur a logarithmic overhead compared to the non-oblivious RAM. In fact, even the seemingly 
weaker notion of differential obliviousness, which intuitively “protects” a single access by guar- 
anteeing that the observed access pattern for every two “neighboring” logical access sequences 
satisfy (€, 6)-differential privacy, is subject to a logarithmic lower bound. 

In this work, we show that any Turing machine computation can be generically compiled 
into a differentially oblivious one with only doubly logarithmic overhead. More precisely, given a 
Turing machine that makes N transitions, the compiled Turing machine makes O(N - log log N) 
transitions in total and the physical head movements sequence satisfies (¢,6)-differential pri- 
vacy (for a constant € and a negligible 6). We additionally show that O(log log N) overhead is 
necessary in a natural range of parameters (and in the balls and bins model). 

As a corollary, we show that there exist natural data structures such as stack and queues 
(supporting online operations) on N elements for which there is a differentially oblivious im- 
plementation on a Turing machine incurring amortized O(loglog N) overhead per operation, 
while it is known that any oblivious implementation must consume Q(log N) operations un- 
conditionally even on a RAM. Therefore, we obtain the first unconditional separation between 
obliviousness and differential obliviousness in the most natural setting of parameters where e is 
a constant and 6 is negligible. Before this work, such a separation was only known in the balls 
and bins model. Note that the lower bound applies in the RAM model while our upper bound 
is in the Turing machine model, making our separation stronger. 
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1 Introduction 


An oblivious RAM (ORAM), introduced in the seminal work of Goldreich and Ostrovsky [Gol87, 
Ost90, GO96], is a tool for “encrypting” the access pattern of any RAM so that it looks “unrelated” 
to the underlying data. It is known that any ORAM scheme must incur at least Q(log N) overhead, 
where N is the size of the memory. This was first shown by Goldreich and Ostrovsky [GO96] in the 
balls and bins model! and assuming that no cryptographic assumptions are used. More recently, 
Larsen and Nielsen [LN18] proved the same lower bound without the two aforementioned restrictions 
but requiring the ORAM to support operations arriving in an online manner. In fact, a follow- 
up work by Jacob et al. [JLN19] showed that Q(log N) overhead is necessary even for obliviously 
implementing very specific data structures (as defined in [WNL*14]) such as stacks, queues, and 
more. 

Apparently, logarithmic overhead is necessary even for implementing a RAM with a (seemingly) 
much weaker security guarantee than full obliviousness [PY19]. Persiano and Yeo [PY19] considered 
the notion of differentially oblivious RAM,” a relaxation of ORAM that only protects individual 
operations by guaranteeing (e, ô)-differential privacy for the observed access pattern of the RAM 
(see Section 2.3 for a formal definition). Note that, differential privacy concerns the observed 
output of some algorithm. In our context, the output of an algorithm consists of the transcript 
of the computation: the physical memory accesses performed during the computation. Differential 
obliviousness was also studied in the context of specific functionalities by Chan et al. [CCMS19] and 
Beimel et al. [BNZ19]. It is shown that there are tasks for which obtaining differential obliviousness 
might be easier than full obliviousness. For instance, Chan et al. [CCMS19] show that there is a 
differentially oblivious algorithm for sorting N records according to a 1-bit key while maintaining 
the relative ordering of records with identical keys in time O(N - log log N),° while [LSX19] showed 
a conditional Q(N - log N) lower bound for full fledged obliviousness in the balls and bins model. 
This leaves the following natural question open. 


Is there an unconditional separation between obliviousness and differential obliviousness? 


Let us remark that in the above question we are interested in the most standard models and 
range of parameters. For RAMs, we consider the standard word-RAM where each memory word 
is large enough to store its own logical address, where word-level addition and Boolean operations 
can be done in unit cost, and where the CPU has constant number of private registers. For (e, ô)- 
differential obliviousness, we want schemes that are secure for € being a fixed constant and ô being 
a negligible function. For Turing machines, we allow an arbitrary number (which is fixed as part 
of the machine’s description) of one-dimensional bi-directional infinite work tapes, where in every 
step the head can moved left, right, or stay in place. 


1.1 Our Results 


Separating obliviousness from differential obliviousness. We present a large class of func- 
tionalities that can be made differentially oblivious with only O(loglog N) overhead. The class 


‘This model assumes that each memory word is indivisible and restricts the ORAM to only move blocks around 
and not apply any non-trivial encoding of the underlying secret data; see Boyle and Naor [BN16]. 

?Persiano and Yeo [PY19] called this notion differentially private RAM, but we prefer to use differentially oblivious 
RAM to (1) relate to the notion of oblivious RAM and stress that the goal is to preserve the physical access pattern’s 
privacy and (2) be aligned with previous work on the topic (Chan et al. [CCMS19]). 

3Maintaining the relative ordering of records is called stability. Without stability, sorting records according to 
1-bit keys is known to be doable (deterministically and obliviously) in linear time [AKL* 20a, AKL* 20b]. 


includes many natural and useful algorithms and data structures such as stacks and queues and 
therefore implies an unconditional separation between obliviousness and differential obliviousness. 


Theorem 1.1 (A separation; informal). For any ¢,6 > 0, there exists a data structure (e.g., a 
stack or a queue) supporting N operations for which: 


1. Any oblivious implementation (even on a RAM) requires Q(N -log N) operations; 


2. There is an (e€,6)-differentially oblivious two-tape Turing machine (defined below) that requires 
O(N - (log(1/e) + log log N + log log(1/6))) 


operations. 


In particular, letting € > 0 be a constant and 6 = 2— log? N (which is negligible), the number 
of operations incurred by the differentially oblivious machine is O(N - loglog N). 


Differentially oblivious Turing machines. The above theorem follows from a much more 
general result about differentially oblivious Turing machines. Oblivious Turing machines were first 
introduced in 1979 by Pippenger and Fischer [PF79]. In this model, “memory accesses” correspond 
to the head’s movements throughout the execution of the algorithm (i.e., Left, Right, or Stay). 
Pippenger and Fischer showed how any multi-tape Turing machine can be obliviously simulated 
by a two-tape Turing machine with a logarithmic slowdown in running time. More precisely, any 
Turing machine that makes N steps can be simulated obliviously while consuming O(N-log N) steps. 
The simulation is deterministic and perfectly oblivious: the same sequence of head movements is 
observed for any two inputs. 

Adapting the notion of differential obliviousness to the Turing machine model, we show that 
any Turing machine that makes N steps can be simulated by a differentially oblivious machine 
while making only O(N -loglog N) steps. Here, neighboring sequences of head movements are ones 
where only one transition is different. For instance, the logical sequences of transitions {Left, Right, 
Left, Right} and {Left, Right, Right, Right} are neighboring. 


Theorem 1.2. /A differentially oblivious Turing machine; see Theorem 5.1] For any €,6 > 0, 
any k-tape Turing machine that makes at most N steps can be simulated by an (e,6)-differentially 
oblivious machine with max{2,k} tapes making O(N - (log(1/e) + log log N + log log(1/6))) steps. 


As above, letting € > 0 be a constant and ô = 6(N) be a particular negligible function, the 
number of steps incurred by the differentially oblivious machine is O(N - loglog N). We note that 
the constant hidden in the O notation depends only on the description size of the given Turing 
machine (i.e., its alphabet size, number of tapes, etc). Let us remark that the number of tapes we 
use is essentially optimal since even without any security requirements simulating a k-tape Turing 
machine for k > 3 on a (k — 1)-tape one is not known to be possible with better than logarithmic 
overhead in steps (Hennie and Stearns [HS66]). Also, simulating a 2-tape machine on a single 
tape machine has polynomial overhead (Hartmanis and Stearns [HS65] for the upper bound and 
Hennie [Hen65] for a lower bound). 


Theorem 1.1 using Theorem 1.2. Consider (for instance) the stack data structure on N 
elements, supporting (“online”) PUSH and Pop operations. By Theorem 1.2 and using the fact 
that a stack can be implemented in linear time on a Turing machine, there is an (e, 6)-differentially 
oblivious Turing machine implementing it whose overhead is O(loglog N) for a constant € and 
negligible 6, as above. As mentioned, the logarithmic lower bound follows from Jacob et al. [JLN19]. 


A lower bound. Lastly, we observe that a lower bound of Chan et al. [CCMS19] can be tweaked 
to show that our construction is essentially optimal by showing that Q(loglog N) overhead is 
necessary for differential obliviousness in a natural range of parameters and in the balls and bins 
model. 


Theorem 1.3 (A lower bound; see Theorem 6.1). There exists an algorithmic task for which there 
is a Turing machine that on input of size N completes it in O(N) steps. On the other hand, for 
any0O<s<VN,€>0,0<8 <1, and0<6< B: (e/s)-e-?**, any (€,6)-differentially oblivious 
implementation in the balls and bins model (even on a RAM) for this task must consume Q(N -log s) 
steps with probability 1 — p. 


In particular, for a constant € > 0 and s > log? N, we can set ô = g~A(log* N) (which is negligible) 


and get that O(log s) = Q(log log N) overhead is necessary. Note that if we want 6 = 270°}, then 
the lower bound above says that the best we can hope for is Q(log N) overhead. As mentioned, 
with logarithmic overhead we can actually get perfect obliviousness for any Turing machine [PF79]. 


1.2 Related Work 


Goldreich and Ostrovsky [Gol87,GO96] showed that any RAM that uses a memory of size N 
and makes T accesses, can be made oblivious using only O(T - poly log N) accesses. The resulting 
RAM is probabilistic and obliviousness holds against polynomial-time distinguishers assuming the 
existence of one-way functions. The concept of oblivious RAM has inspired an immense amount of 
research. One line of work, focuses on applications of such compilers to cryptography and security, 
including applications in cloud computing, secure processor design, multi-party computation, and 
more (for example, [0S97, SS13, SSS12, BNPT15, FDD12, RYFt13, MLS*13, FRK*15, WHC1T14, 
GHJR15, LWNT15, ZWR*16, LO13, WST12]). Another line of work, focuses on improving the 
overhead of the compiler [SCSL11, KLO12,GM11, CGLS17, SvDSt 13, WCS15]. Only recently, a 
couple of works [PPRY18, AKL* 20a] have resolved the problem by presenting a compiler whose 
overhead is O(log N) (while still relying on one-way functions). 

Patel et al. [PPY19] considered the natural question of what kind of security can one hope for 
while limiting the overhead of a RAM simulation to constant. They show a construction of an 
(e, 0)-differentially oblivious RAM with O(1) overhead for € = O(log N) and also assuming that the 
client can store w(log N) records. They also proved a lower bound which quantitatively improves 
upon the one of [PY19] in the dependence on e but is qualitatively worse since it is in the balls and 
bins model. Throughout this work, we focus on the setting where e is a fixed constant and also 
that the client’s storage is a constant number of blocks. 

The work of Pippenger and Fischer [PF79] came in a long line of works trying to pin down 
the exact relation between various different computational models. One notable work is that of 
Hennie and Stearns [HS66] who showed that any multi-tape Turing machine can be simulated by 
a two-tape machine with logarithmic overhead. Pippenger and Fischer’s result can be viewed as a 
similar compiler except that their resulting machine is also oblivious. Note that the result of [HS66] 
is the reason why one should not hope to improve the number of tapes in the resulting machine 
in Theorem 1.2 to two (as this task, even without privacy, is not known to be possible with less 
than logarithmic overhead). Simulating a 2-tape Turing machine on a single tape machine requires 
polynomial overhead due to Hartmanis and Stearns [HS65] and Hennie [Hen65}. 

Some of our ideas in the differentially oblivious Turing machine construction are reminiscent 
of the aforementioned differentially oblivious algorithm (in the RAM model) for stable tight com- 
paction due to Chan et al. [CCMS19]. Technically, their algorithm uses similar tools from the 
differential privacy literature (namely, differentially private prefix sums due to Chan et al. and 


Dwork et al. [CSS10,CSS11, DNPR10]) but the way they use it differ in nature from our approach. 
Partly, this is because our target machine is a Turing machine rather than a RAM, and therefore, 
standard building blocks such as oblivious sorting (which they use) are inapplicable. Second, even 
if we allow compiling a Turing machine to a differentially oblivious RAM (rather than insisting on 
Turing machine as the target machine), we still cannot directly use their techniques for construct- 
ing stable tight compaction because their techniques which rely on oblivious sorting are offline in 
nature; and thus not compatible with the online nature of our differentially oblivious simulation. 


1.3 Technical Roadmap 


In this overview we will focus on simulating a Turing machine with a single tape for N steps. There 
are many complications and technical difficulties that arise in the multi-tape case, but we refer to 
the technical sections for details. 

Given a Turing machine our goal is to hide the location of the head during the execution of 
the machine, in a differentially private manner. We first view this as an independent problem: 
given an execution of a Turing machines, output the locations of the head at pre-defined points 
in time throughout the execution in a differentially private manner. This is of course a necessary 
sub-problem to solve (as any differentially oblivious Turing machine must solve it). However, for 
now it is not at all obvious why it would be relevant for us, but we will explain how this algorithm 
helps later. 

To this end, we first develop an efficiently differentially private algorithm for estimating the 
head’s location at pre-defined points in time. As a first (naive) attempt, we could add a fresh 
Laplacian noise every time we need an estimate, but then either the privacy will suffer from a loss 
that depends on the number of required estimates or the efficiency of the scheme will scale with 
this number (since the noise would need to be really large). 

To get around this, inspired by the work of Chan et al. [CCMS19], we use a differentially private 
prefix sum algorithm [CSS10,CSS11, DNPR10] to account for the location of the head. Recall that 
in the prefix sum algorithm, a stream of number arrives in an online manner and the algorithm 
outputs the sum of all number seen so far, after seeing every number. We set up the numbers to 
correspond to head movements (“Left” for -1 and “Right” for 1) and show that this approach incurs 
only poly log N loss in privacy budget, which is good enough in terms of privacy. One challenge 
that we run into is that we need to implement the differentially private prefix sum algorithm on 
a Turing machine. It turns out that every time we need to get an estimate of the head’s location 
(i.e., get a prefix sum), we need to pay some non-trivial factor in running time and so we need to 
minimize the number of such estimations. Therefore, we design our algorithm to work with only 
one estimate of the head’s location every poly log N steps and amortize the cost of this estimation 
while processing the next poly log N steps of the Turing machine. 


Using the estimate. Once we have a good-enough estimate of the head’s location every poly log N 
steps, all that is left is to copy the nearby positions to a smaller oblivious Turing machine which 
we use to simulate the next poly log N steps. We set up the parameters in such a way that we copy 
enough positions around the estimated head’s location to actually include the real head position 
along with the relevant tape around it to perform the next polylog N steps so the above is well 
defined. The oblivious Turing machine that we need must provide an “initialization” procedure 
that allow us to start an oblivious Turing machine from an existing memory, and a “destruction” 
procedure which allows us to extract the memory to its original structure in the end of the execu- 
tion. Pippenger and Fischer’s [PF79] construction does not provide such procedures so we describe 
a variant that does. As an independent contribution, our new oblivious Turing Machine is described 


in a language that more closely resembles the hierarchical oblivious RAM construction of Goldreich 
and Ostrovsky [Gol87,GO96] so it might be easier to understand for those who are familiar with 
the latter. 

Lastly, let us remark that the above description was very high level and glossed over many 
technical details. For example, one technicality arises because we have to use the tapes sparingly in 
the compiled Turing machine to get a theorem that is tight in the number of tapes. To achieve this, 
we develop algorithmic tricks that allow us to reuse the same tape for multiple purposes without 
incurring any overhead in asymptotic running time. Other technical challenges arise because, unlike 
earlier explorations on differentially obliviousness [CCMS19,BNZ19], our target machine is a Turing 
machine rather than a RAM. This imposes additional constraints for our algorithm design since we 
cannot use common building blocks such as oblivious sorting. Moreover, the online nature of our 
differentially oblivious simulation also renders some previous building blocks inadequent (which we 
discuss more in the Related Work section). We refer to the technical sections for details. 


1.4 Future Directions 


This paper shows that a wide class of programs can be compiled into differentially oblivious coun- 
terparts with much smaller than logarithmic overhead. The class is Turing machine computations 
where the accessed locations are adjacent between every two memory accesses. There are several 
open problems that our work puts forward: 


e Our upper bound has a multiplicative doubly-logarithmic term and we managed to show that 
it is necessary in a natural range of parameters and in the balls and bins model. It would be 
interesting to either show tightness of our upper bound for all settings of parameters or lift 
the balls and bins model restriction. 


e Are there other meaningful relaxations of obliviousness that suffice for (some) applications 
and additionally could result with more efficient constructions? 


e We focused on a statistical setting (as usually done in the context of differential privacy). 
Can cryptography help? 


2 Preliminaries 


For an integer n € N we denote by [|n] the set {1,...,n}. A function negl: N > R* is negligible if 
for every constant c > 0 there exists an integer Ne such that negl(A) < A~© for all A > Ne. 


2.1 Turing Machines 


We follow the presentation of Arora and Barak [A B09] for the definition of a k-tape Turing machine. 
A tape is an infinite bi-directional line of cells, each of which can hold a symbol from a finite set 
called the alphabet. Each tape is associated to a tape head that can potentially read or write 
symbols to the tape one cell at a time. The machine’s computation is divided into discrete time 
steps, and the head can either stay in place or move left or right one cell in each step. More 
formally, a Turing machine M is described by a tuple (T, Q, A), where F is a set of symbols that 
M’s tapes can contain, Q is the set of M’s possible states, and A: Q x T? > Q x I* x {L, 5, R}* 
is M’s transition function. 

If the machine is in state q € Q and (01,02,...,0%) are the symbols currently being read in 
the k tapes, and A(q,(o1,...,0%)) = (q', (o4,---,0%),2), where z € {L, S, R}*, then at the next 


step the o symbols in the k tapes will be replaced by the o’ symbols, the machine will be in state 
q', and the k heads will move Left, Right, Stay in place, as given by z. There are additionally a 
read-only tape for the input and a write-only tape for the output, and perhaps a randomness tape 
if needed, but we ignore those when counting the number of tapes and only account for the work 
tapes. The space complexity of a Turing machine is the total number of cells it utilizes throughout 
its execution (over all tapes). 

The definition above is quite robust to the choices one makes regarding the alphabet size, the 
number of tapes, etc, since they are all equivalent in terms of complexity up to small factors. We 
recall the known facts which can be found, for example, in Arora and Barak [ABO9]. 


Fact 2.1. It holds that: 


1. Every function f that is computable in time T using alphabet T, can be computed in time 
O(log |T'|-T) using an alphabet of size O(1). 


2. Every function f that is computable in time T using k tapes, can be computed in time O(k-T?) 
on a single tape machine and in time O(k -T -log T) on a two-tape machine. 


3. Every function f that is computable in time T using k bi-directional tapes, can be computed 
in time O(T) using k standard (uni-directional) tapes. 


4. Every function f that is computable in time T using k tapes, can be computed in time k -T 
using k tapes such that in each step only one of the tapes moves. 


We mention the dependence on k in the above terms for explicitness even though it is a constant 
(and throughout the paper we treat it as constant). 

In this work, we care about logarithmic factors so, by default, our Turing machine model is 
that of a two-tape machine. By the above, it does not matter if we consider uni-directional or 
bi-directional tapes. Constant factors in the alphabet size do not matter as well. All of the above 
only affect the constants which are hidden inside the O notation. 


2.2 Differential Privacy 


Differential privacy, introduced by Dwork et al. [DMNS06], is a property of algorithms that, very 
roughly, guarantees “security” for a single record in the input. Namely, if the algorithm acts on 
the information of a set of individuals, from the output it is hard to decide whether a particular 
individual’s information was used in the computation. This is formalized as follows. Let A be a 
probabilistic algorithm that takes as input a dataset. Let Im(A) be the set of all possible outputs 
of A. The algorithm A is said to be (e, ô)-differentially private if for all datasets Do and D1, that 
differ only on one entry, and all possible subsets S C Im(A), it holds that 


Pr[A(Do) € S] < e - Pr[A(Dj) € S] + ô, 


where e is the base of the natural logarithm. 

We emphaisze that differential privacy (as defined above) provides a stataistical property which 
holds for all “distinguishers”, even computationally unbounded ones. There are relaxations of the 
differential privacy guaranttee to hold for efficient (computationally bounded) distinguishers. For 
example, Mironov et al. [MPRV09] defined such a relaxation and used it to obtain more accurate 
differentially-private protocols. In this work, however, we consider only the classical statistical 
notion. 

We refer to Dwork and Roth [DR14] for more information on differential privacy. 


2.3 (Differentially) Oblivious Turing Machines 


Obliviousness. Obliviousness is nowadays usually defined for RAMs and it guarantees that the 
access pattern of the RAM is “independent” of the underlying input. More specifically, given a 
RAM M and an input I, we consider a random variable Accesses(/V,1I) that corresponds to the 
ordered sequence of memory locations M accesses during an execution on input I. We then require 
that the distribution of Accesses( M, I; ) is indistinguishable from Accesses( M, Iz) for any I, and I, of 
the same length. The precise notion of indistinguishability can be either computational, statistical, 
or perfect, depending on the context. 

A Turing machine can be thought of as a restricted version of RAM where random accesses 
are not allowed but any two consecutive accessed addresses must be to adjacent locations. In the 
context of Turing machines, this corresponds to allowing the head to move only one step right or 
left at a time. This directly induces an adaptation of the notion of obliviousness to the Turing 
machine model: the tape’s head movements during the execution of the algorithm should not not 
leak information about the inputs. Again one can define various notions of obliviousness for this 
restricted model, including computational, statistical, or perfect. We consider the strong notion of 
deterministic perfect obliviousness. 


Definition 2.2 (Oblivious Turing machine). A Turing machine M is said to be oblivious if for 
every input x € {0,1}* andi € [N], the location of each of M’s heads at the ith step of execution 
on input x is only a function of |x| and i. 


Differential obliviousness. Differential obliviousness was introduced by Chan et al. [CCMS19] 
as a relaxation of obliviousness for RAMs. Recall that differential privacy is a framework for 
protecting individial records in a (large) database processed via some algorithm. Formally (as we 
explain in Section 2.2), this is formalized by saying that some observable event happens (almost) as 
likely in a database that has a particular record and in one that does not. What are those records 
and databases in our context? 

There are two possibilities that come to mind. The first is the input for the algorithm that we 
execute. The second is a sequence of memory accesses / head movements. We follow previous work 
(e.g., [CCMS19, PY19]) and consider the latter option, adapted to the Turing machine setting. 

Note that, in some cases, neighboring inputs translate to neighboring head movements, and in 
some cases they do not. For example, consider a streaming setting where events come in one by 
one, and depending on the type of the event, you either pop or push a stack. In such a scenario, 
neighboring inputs translate to neighboring head movements. However, in other settings, this is 
not necessarily the case. Imagine that the first bit of the input tells you a direction and the rest 
of the input is interpreted as a number. The program makes this number of steps in the specified 
direction. Clearly, two inputs that differ only in their first bit result with a very different sequence 
of head movements. 

To summarize, we formalize our notion by requiring (€, 6)-differential privacy for the observed 
access pattern/sequence of head movements. Specifically, we say that two sequences of accesses Ip 
and I, are neighboring if they are of the same length and differ in exactly one location accessed. 
Lastly, we mention that defining neighboring inputs w.r.t the observed sequence of head move- 
ments, as we do next, still implies a privacy guarantee for the inputs using standard group privacy 
theorems [DMNSO6]. 


Definition 2.3 (Neighboring head movements). Let M be a k-tape Turing machine. For J € 
{0,1}*, let Movements(M,J) € ({L,.R,S}*)* be the sequence of head movements that M does on 
input J. Two inputs Jo, Jı € {0,1}* are called neighboring if 


1. |Movements(M, Jo)| = |Movements( M, J1)| and 
2. Movements(M, Jo) and Movements(M, J1) differ in exactly one location. 


That is, letting a pa a), irp (Y, see ON) = Movements( M, Jẹ) for b € {0,1}, there is ex- 
actly one pair (i*, j*) € [N] x [k] such that ed Æ or. while for all other (i, j) € [N] x [k]\{(*, 7*)}, 
it holds that G = 0;". 


Given this notion of neighboring inputs, we give the definition of differential obliviousness. 


Definition 2.4 ((c,6)-differentially oblivious Turing machine). A Turing machine M satisfies 
(e, 6)-differential privacy iff for any neighboring inputs Jo and Jı and any set S € ({L, R, S}*)* of 
possible sequence of head movements, it holds that 


Pr [Movements(M, Jo) € S] < e€ - Pr [Movements(M, J1)) € S] + ô. 


We emphasize that in the above the set S$ consists of only head movements. It does not include 
the contents of the cells. Also, we assume that the initial head position is fixed and known. Without 
loss of generality, it is also okay to allow the adversary to see the accesses (but not contents) to the 
input, output and randomness tapes (since we can start/end by copying to/from the main tape). 
Not allowing access to cells’ content is justified by assuming that they are already hidden from 
the distinguisher in some way. This could be by some form of encryption. This means that we 
only care about the cost of making the access pattern/head movements (differentially) oblivious 
and ignore the underlying method used to hide the cells’ content. This is the standard adversarial 
model considered in the oblivious RAM literature. 

Looking forward, we achieve Definition 2.4 by giving a compiler from (deterministic) Turing 
machine TM to a (randomized) Turing machine TM’. The latter TM’ has a security property that 
depends on the inputs being fed into it. Specifically, if TM’ is executed on two inputs that result 
with neighboring head movements on the original machine TM, then the resulting head movements 
in TM’ will be indistinguishable. Previous works, e.g., [PY19], have the same definition but with 
head movements replaced with logical memory access sequence. 


3 Estimating Heads’ Locations 


In this section, we present an algorithm running on a Turing machine that outputs estimates to the 
location of the heads in a Turing machine computation. More precisely, the input to the algorithm 
is a sequence of movements of the heads of the machine (i.e., Left, Right or Stay for each tape), 
and it outputs an estimate to the location of the head in a-priori fixed intervals of time in an 
online fashion. The algorithm (1) outputs estimates which are not too far from the true position 
of the head at a given time, (2) the estimates are differentially private, (3) the algorithm’s head 
movements themselves are oblivious (i.e., data-independent), and (4) the algorithm is very efficient. 
The interval at which we output an estimate on the location of the heads is a parameter, denoted p. 


Theorem 3.1. Fixk € N. There exists an algorithm EstimateHead, 5 such that for any €,6 > 0, the 
following holds. Fix any stream a = a1, a2,...,an € {L, S, R}* that corresponds to the movements 
of the heads of a k-tape Turing machine. Treating L as -1, S as 0, and R as 1, let ci = fel Gj 
fori € [N/p] (i.e., the true position of the heads every p steps). Let {Gi}ie[n/p| denote a possible 


output of the algorithm EstimateHead,5 when fed a as an input in an online fashion. It holds that: 


1. Utility: With probability 1 over the randomness of EstimateHead,5, it holds that 


max |õ; —0;| € O((1/e) -log!® N - log(1/6)) . 
max |: — oil € O ((1/e) log! N - 1og(1/8)) 


2. Differential privacy: EstimateHead,.5 is (€,6)-differentially private (as per Section 2.2). 
Here, neighboring sequences are defined in the natural way by allowing only one of the k 
indices in one of the N a,;’s to differ between the two sequences. 


3. Obliviousness: The algorithm itself is perfectly oblivious. 


4. Efficiency: The algorithm runs in time O(N + (N/p)- (log N)). 


Proof. The algorithm builds on the differentially private prefix-sum algorithm of Chan et al. [CSS10, 
CSS11] and Dwork et al. [DNPR10]. Their algorithms address the problem of continuously estimat- 
ing the prefix sums of elements in a given stream of numbers while maintaining differential privacy. 
We follow the presentation of these algorithm from Dwork and Roth [DR14, §12.3]. The algorithm 
is given a stream of numbers b = 0j,bo,...,by € {—1,0,1} that the algorithm sees in an online 
fashion. The algorithm outputs, after seeing b),...,b; an approximation of )~/_, bi. This task is 
almost what we need to prove our theorem for k = 1 (i.e., the machine has one tape). Indeed, a 
movement left (resp. right) can be interpreted as -1 (resp. 1) and staying in place corresponds to 
seeing 0. The location of the head is exactly the sum of those numbers. Additionally, it is not hard 
to observe that their algorithm is in fact oblivious (see below). Nevertheless, the running time of 
their algorithm on a Turing machine is O(N - log N) (see below). So, we need to (1) extend the 
algorithm to handle any k > 1 tapes and (2) show how to implement it in the specified running 
time on a Turing machine. Since both goals are somewhat non-trivial to achieve, let us first briefly 
recall their algorithm and state its guarantees, and then describe our modifications. 

Assume that N is a power of 2 (for simplicity and without loss of generality). We associate the 
N numbers to leaves of a full binary tree and then label each node in the tree with an “interval”. 
The ith leaf (for i € [N]) is labeled with [i,7]. An internal node is labeled with the interval that 
is the union of the intervals associated with its children. Now, with each node, labeled [s,t] in 
this tree, we associate a noisy count that approximates the sum of the values seen in positions 
s,s +1,...,t by adding noise from the appropriate distribution. In [CSS10,CSS11, DNPR10] the 
added noise was sampled from Lap((1 + logy V)/e), where Lap(s) denotes the (continuous) Laplace 
distribution with mean 0 and variance 2s”. To output 6; (i.e., the approximation of eet aj), we 
write t = i- p in binary to find at most log, N intervals whose union is [1, t], and compute the sum 
of the corresponding noisy counts. These intervals are associated to the nodes which are called 
the frontier. This algorithm satisfies (€,0)-differential privacy and satisfies the following utility 
property where 6 is a parameter: With probability 1 — 6 over the randomness of the algorithm, 


max |G; —0;| < O((1/e) -logt N - log(1/6)) . 
pax, |: — ail < O ((1/¢)- log N -1og(1/8) 


It is easy to turn the utility property to be satisfied with probability 1 by outputting the exact 
prefix sum in the clear whenever the error in the output is too large. This causes the algorithm 
to be (e, 6/2)-differentially private, as needed. From the description of the protocol, the only step 
that depend on the input sequence is the one where we compute the sum of the noisy counts along 
the logs N intervals; all other steps depend only on N (and fresh randomness). Since computing 
the sum can be easily made (perfectly) oblivious, we get that the algorithm itself is (perfectly) 
oblivious. This property will be preserved throughout the following modifications. 


Handling multiple tapes. We extend the algorithm to handle k tapes by maintaining k prefix 
sums computed in parallel. This clearly does not hurt utility or obliviousness and only incurs 
a k factor in running time. However, naively, it also incurs a k factor in differential privacy. 
Nevertheless, we observe that considering any two neighboring sequences of inputs, k — 1 of the 
tapes will have the exact same access pattern while only one will differ in one position, and so this 
extension, in fact, does not incur a k factor in differential privacy. 


Running time. The main challenge is to maintain an updated version of the noisy counts as- 
sociated to the nodes in the frontier. Recall that the frontier is of size log, N +1. Naively, with 
the above algorithm, computing the frontier at time i+ 1 from the frontier at time i may cost 
up to O(log N) work which is too expensive for us. However, recall that we do not need a prefix 
sum after every a;, but rather we want to output one after every p inputs. So, instead of having 
a full binary tree where the leaves correspond to each input, we consider a full binary tree where 
each leaf corresponds to a sequence of p inputs and it is labeled by their sum. The depth of this 
tree is log,(N/p) and the point is that we need to compute the “next” frontier (which costs about 
logy N) only once every p operations, so the total cost is O(k - (N/p) -log N) plus the time it takes 
to aggregate the sum itself which is O(k- N), as needed. a 


Remark 3.2 (Sampling from Lap). We emphasize that the above algorithm assumes that a Turing 
machine is capable of sampling from Lap(-) in O(1) time. This is assumed for simplicity of presen- 
tation. However, it is possible to efficiently compute an estimate of this distribution on a standard 
Turing machine. The cost of this approximation is good enough to obtain (asymptotically) the same 
final result in Theorem 5.1. Therefore, the assumption being made in this section is without loss of 
generality in the context of our main result. See details in Appendix A. 


4 Oblivious Turing Machines 


A classical result by Pippenger and Fischer [PF79] shows that any Turing machine computation can 
be made perfectly oblivious (i.e., Definition 2.2) on a two-tape machine with amortized logarithmic 
overhead. More precisely, any Turing machine that makes at most N steps can be made perfectly 
oblivious while making O(N - log N) steps. 

In our application we need an oblivious Turing machine which support two additional properties. 
The first, called “initialization”, is that one can initialize an oblivious Turing machine with a given 
memory (as opposed to starting off with an empty memory). The second, called “destruction”, 
returns the state of the memory in a linear fashion. 

Our construction is similar in spirit to the one of Pippenger and Fischer [PF79]. However, we 
present it in a language that more closely resembles the hierarchical oblivious RAM construction 
of Goldreich and Ostrovsky [Gol87,GO96] so it might be easier to understand for those who are 
familiar with the latter. 


Theorem 4.1 (Oblivious Turing machine, revisited). Any k-tape Turing machine that makes at 
most N steps can be executed obliviously on a two-tape Turing machine with O(N) space and with 
O(N -log N) steps. Additionally, the machine supports initialization and destruction. 


Proof. We will present the main idea in the special case where the given Turing machine has only 
a single tape and the resulting machine will have £ = [log N] tapes. Later, we will explain how to 
handle multiple tape machines in the input and simulate them obliviously with just two tapes (at 
the same cost). 
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We refer to each of the £ tapes as a “level” (analogously to levels in [GO96]’s hierarchical ORAM 
construction). Level 1 has capacity to hold 2-3 = 6 cells, level 2 has capacity to hold 4-3 = 12 
cells, and in general level i has capacity to hold 2° -3 cells. Each level is split into virtual “blocks”, 
each holding 2° cells. Every such block starts at an address j such that j mod 2’ = 0. That is, 
level 1’s blocks start at even addresses, level 2’s blocks start at addresses divisable by 4, and so 
on. At the beginning, when a level is built, the head of that level points to the middle block; for 
concreteness, say it always points to the leftmost element in the block. 

Recall that an access consists of a read or write operation as well as a direction to move the 
head to, left or right. When an access is made, all cells in the middle block of the smallest level 
are scanned, all of which being ignored except a single cell where the real head points to and is 
therefore the one we need to read from or write to. The original TM head position is stored in a 
designated register and it is updated upon each access. 

Every level i is rebuilt every 2’! logical steps. That is, level 1 is rebuilt after every step, level 
2 is rebuilt after 2 steps, and so on. Here is how a rebuilt of level 7 works: 


1. Level i writes its contents to level į + 1 in a block-wise manner. That is, each block in level 
i puts itself in the right place (out of 6) in block i+ 1. To do this obliviously, each block in 
level ¿ tries to place itself in one of the possible positions and only one of these attemps will 
not be dummy. (This costs O(1) head movements.) 


2. The address of the middle block for level i is being recomputed based on the original TM 
head position and then level 7 gets updated contents from level 7 + 1 in a block-wise manner. 
That is, each block in level į gets updated information from the right place (out of 6) in block 
i+ 1. This is also done via O(1) head movements in brute-force. 


Correctness follows immediately by description. Perfect obliviousness follows from the fact that 
the head’s movements are deterministic and a-priori fixed. For efficiency, consider any sequence of 
N steps. A read or a write are done at O(1) operations cost by just accessing the middle block in 
the first level. It remains to account for the cost of the reorganization steps. By description and 
recalling that the size of level 7 is 2’-3, the total amount of steps performed by the oblivious Turing 
machine is bounded by 


We now explain how to remove the simplifying assumptions we had, the first being that the 
input machine has only one tape and the second being that the resulting oblivious machine uses 
log N many tapes. Let us first handle the former, letting k be the number of tapes used by the 
input machine. We use an encoding trick. We encode the k tapes into a single tape by first 
placing all the first cells from each tape, then the second cell, and so on. Each “track” will have 
its own head marker. By the construction of the oblivious Turing machine, all the tracks can be 
processed simultaneously (recall that our head movement sequence is deterministic), incurring a k 
multiplicative factor. 

Now, we explain how to modify our Turing machine to use only two tapes. As a first step, let 
us place the different levels one after the other on the single tape. Naively, this incurs a blowup 
in running time due to the reorganization steps. Indeed, in the reorganization steps, we need to 
scan two levels “in parallel” as cyclic buffers. The only way to do this with a single tape is by 
moving back and forth in the tape which is too expensive. This is where we will use the second 
tape. When such a “parallel” scan is needed, we will copy one of the levels to the second tape, do 
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the “parallel” scan by scanning both tapes in parallel, and then copy it back. This only incurs a 
constant overhead. 


Initialization and destruction. In our application, we will need an oblivious Turing machine 
with two additional features so we explain how to implement them next. The first is that we need 
to support initialization with a given memory which might not necessarily be empty. We implement 
this by starting with an empty memory, as described above, and modifying the memory one element 
at a time. If the number of steps that we eventually perform on the Turing machine is about the 
size of the initial memory, the cost of this step will be amortized away. 

The second feature is a destruction procedure which outputs the memory at the end of the 
computation in a linear fashion. This is not so immediate since our construction does not store the 
memory in a linear fashion. Recall that our construction satisfies that at the end of every 2° steps, 
the cells corresponding to the level T; store the content of the tape at positions [p— 2’~! : p +271]. 
This means that if “destruct” is invoked after a power-of-2 many steps, the memory is stored exactly 
in the cells corresponding to some level and one can make one linear scan to extract those elements 
and put them one next to the other. If destruct is invoked after some other number of steps, we 
need to modify this procedure slightly by collecting the most updated memory values of each cell 
from the appropriate level (now, the most updated values are spread amongst different cells). This 
again can be done by a single scan. | 


5 A Differentially Oblivious Turing Machine 


In this section, we describe our transformation from any Turing machine into a differentially obliv- 
ious one. 


Theorem 5.1. For any €,6 > 0, any k-tape Turing machine that makes at most N steps and 
consumes © space can be transformed into an (e,6)-differentially oblivious max{2,k}-tape Turing 
machine that makes O(N - (log(1/e) + log log N + log log(1/5))) steps and consumes O(S + (1/e) - 
log? N - log(1/5)) space. 


The construction of the differentially oblivious Turing machine uses the oblivious Turing ma- 
chine construction from Section 4 and the head’s location estimation algorithm from Theorem 3. 
We will present the construction in steps. We first assume that the input machine uses only one 
tape and the resulting machine will use many tapes. Then, we will explain how to get rid of both 
simplifying assumptions and therefore obtain Theorem 5.1. 


5.1 From One Tape to Four Tapes 


Assume first that the given machine, M, uses only a single tape. We first present a construction 
that compiles M into a differentially oblivious Turing machine doM with 4 tapes. Fix €, ô > 0 for 
the rest of this section. 


Tape allocation. The resulting Turing machine, doM, will consist of four tapes, numbered 1, 2, 
3, and 4, in the following order: 


1. One tape to simulate the input Turing machine computation (recall that we assumed that 
the input machine has only one tape). 


2. Two tapes for running an oblivious Turing machine (according to Section 4). 
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Figure 1: An illustration of the differentially oblivious TM tape configuration for compiling a 
single-tape TM into a 4-tape machine. 


3. One tape to compute differentially private head’s location estimation algorithm (according to 
Section 3). 


The algorithm. As mentioned, we use the oblivious Turing machine implementation from Sec- 
tion 4 but since its overhead is logarithmic (in the running time of the non-oblivious machine), we 
do not want to apply it directly on our machine. Instead, we are going to break down the com- 
putation of the original machine into epochs and invoke the oblivious machine only within epochs. 
Concretely, we split the computation of M into epochs of size 


p= (1/e)- log? N - log(1/6). 


Each such epoch will be executed in its own “fresh” oblivious Turing machine and so the overhead 
will only be a doubly logarithmic factor in N. Next, we explain how doM works. 


1o4doM, 5: 
1. Set h? = 0 to be the initial approximate position of the head (it is equal to the real position). 


2. Break the T-step computation into epochs of p steps of computation. For epoch i = 1,..., N/p, 
do: 


(a) Copy an area of size 4p + 1 around hiv t, namely [hiv t} — 2p, hi! + 2p] to the oblivious 
Turing machine (Theorem 4.1). Perform the next p steps of computation there. At the 
end of the epoch, copy the state of these 4p + 1 cells back to the main tape. 


(b 


NH 


In parallel, keep track of the movements of the head and count the offset of the head 
compared to the previous location, hij. At the end of the epoch, invoke the differentially 
private head’s location estimation algorithm (Theorem 3.1) with privacy parameters € 
and ô to update the location of the head hi. 


See an illustration in Figure 1. 
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Theorem 5.2. For any €,6 > 0 and given any single-tape Turing machine M that makes at most 
N steps and consumes S space, the 4-tape machine doM,,s is (€,6)-differentially oblivious, makes at 
most O(N - (log(1/e) + loglog N + loglog(1/6))) steps and consumes O(S + (1/e) - log? N - log(1/6) - 
(log(1/e) + log log N + log log(1/6)) space. 


Proof. We first prove correctness, ignoring obliviousness. Consider any sequence of operations. 
At any point in time, the oblivious Turing machine contains 2p + 1 memory cells and performs 
all necessary operations within. For correctness, by description, it is enough to show that the p 
operations are indeed contained within those 2p + 1 cells. Indeed, for this to hold it is enough to 
argue that hi, is close enough to the real location of the head: hi —2p < h'—pand hi +2p > hî +p, 
where ht is the true location of the head. In other words, we need to show that 


Ihi — h'| <p. 


By the utility property of the head’s location estimation algorithm (Theorem 3.1), we know that 
Ihi, — h®| < (1/e) - logt” N - log(1/5) (this is the upper bound on the additive error of each head’s 
location estimation for every 7). Now, the above inequality follows by recalling that p = (1/e) - 
log? N - log(1/6). 

To prove (e€, 0)-differential obliviousness, consider any two sequences of operations Ig and I; that 
differ at one operation. Consider the random variable fj, corresponding to the physical tape heads 
locations on input I, for b € {0,1}. Say the two sequences Ig and I; differ in the ith operation and 
are otherwise identical. Then, the first i — 1 operations result with identical distributions of head 
locations in both executions (as all the underlying building blocks are perfectly oblivious). The 
only difference is in the ith operation. There, the head’s locations might differ due to a different 
distribution of the head’s location estimation algorithm (Theorem 3.1). However, we are guaranteed 
that this algorithm is (€, 6)-differentially oblivious. The rest of the heads’ movements are perfectly 
oblivious: the oblivious Turing machine is perfectly oblivious, the head’s location estimation algo- 
rithm itself is perfectly oblivious, and the other operations that we do in the implementation of 
doM are trivially oblivious. 

Lastly, we analyze efficiency by counting the total amount of work and space required to handle 
any sequence N operations that consume S space. Step 3a costs O(p-log p) operations and space due 
to Theorem 4.1 (the rest of the operations can be implemented in O(p) time and space). Computing 
the differentially private head’s location estimation in Step 3b takes overall O(N +(N/p)-(log N)) < 
O(N) time due to Theorem 3.1 (i.e., constant amortized work per access). Otherwise, simulating 
the original computation and accounting for the location of the head, requires O(N) work and 
space. Overall, over N operations, the total space is O(N + p -log p) and the work is bounded by 
O(N - logp). Plugging in p = (1/e) - log? N - log(1/5) completes the proof. E 


5.2 From One Tape to Two Tapes 


In this section we show how to obtain the same result as in the previous section (i.e., Theorem 5.2), 
except that our resulting Turing machine will only use two tapes (instead of four). One tape will be 
used for the simulation of the original Turing machine plus one of the tapes of the oblivious Turing 
machine and the other tape will be used to perform the head’s location estimation algorithm and 
the other tape of the oblivious Turing machine. 

Recall that the tapes used for the oblivious Turing machine, both consume about p space, 
but one interacts with the main tape (call it tape oTM,) and the other acts as a scratch pad 
and the values that are written there are never accessed outside of the oblivious Turing machine 
implementation (call it tape oTM2). We will merge tape oTMg into the tape that simulates the 
original computation, and tape oT M, to the tape that computes the prefix sums. 
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Tape 1 (Main Tape). This tape will consist of the main computation tape as well as a blank area 
which is used for tape oT Mg of the oblivious Turing machine. We use the fact that the required space 
for the oblivious Turing machine is O(p) cells and so we will maintain such a “space of blanks” 
which will not be too far from the real position of the head and will be used whenever a new 
oblivious Turing machine is instantiated. Let us denote by Sotm = O(p) the space consumption 
of the oblivious Turing machine. Our first tape, the one that simulates the computation of the 
original Turing machine, will maintain the invariant that in distance Sotm from the location of 
the approximate head hz to the right, there are Soọrm blank cells (the last blank cell has distance 
2So7m from hz). If this invariant holds, then whenever an epoch begins, we can move to the blank 
area and use it as the oblivious Turing machine tape. At the end of the epoch, we can go back to 
where we were. Since we perform p operations inside the oblivious Turing machine, the amortized 
cost of moving back and forth is O(1) per operation which is what we need. 

We thus need to explain how to maintain the above invariant. The idea is to move the blank area 
together with the location of the head once every epoch. Namely, once we update the approximate 
position of the head hz, we will also move the blank area appropriately so that its distance from 
the new hy is as we require. Moving the blank area, as above, can be done simply in time O(p) 
using a designated size O(p) space in the other tape (tape 2)—this is done by moving the area that 
needs to go to the blank area to tape 2 (and replacing it with blanks), and then copying by moving 
both heads “in parallel”. 


Tape 2 (Secondary Tape). This tape will consist of three areas, each of size O(p). One of these 
areas will be used for moving the blank area in tape 1, as we explained above. Another area is for 
the computation of the head’s location estimation—this algorithm has state of size O(log N) < O(p) 
(which contains a frontier of a tree of noisy sums per interval). The third part is for tape oTM, of 
the oblivious Turing machine (which also uses O(p) space). 

The first and second parts in this tape are accessed at the end of every epoch and some compu- 
tation of length O(p) is performed on each of them (either updating the prefix sum or moving data 
around). Since each epoch handles O(p) operations, the amortized cost of this part is O(1) per 
operation of the original machine. The third part, in contrast, is accessed throughout the epoch, 
and there we get O(log p) overhead per operation. 


142doM7 5: 
1. Set h2 = 0 to be the initial approximate position of the head (it is equal to the real position). 


2. Allocate an empty area of Sotm = O(p) cells in Tape 1 with some large enough hidden 
constant. Call this “the blank area”. This area will be S,tm away from hè. 


3. Break the T-step computation into epochs of p steps of computation. For epoch i = 1,..., N/p, 
do: 


(a) Copy an area of size 4p + 1 around hy! from Tape 1, namely [h4 ! — 2p, hg! + 2p], to 
the beginning of Tape 2. Perform the next p steps of computation using an oblivious 
TM, treating Tape 2 as the main tape and the blank area in Tape 1 as the scratch tape. 
At the end of the epoch, copy the state of these 4p + 1 from Tape 2 back to the right 
location in Tape 1. 


€ 


In parallel, keep track of the movements of the head and count the offset of the head 
compared to the previous location, h1. At the end of the epoch, invoke the differentially 
private head’s location estimation algorithm (Theorem 3.1) with privacy parameters € 
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Figure 2: An illustration of the differentially oblivious TM tape configuration for compiling a 
single-tape TM into a 2-tape machine. 


and 6 to update the location of the head ht. This is done in using a designated area 
in Tape 2 (residing after the area used to the oblivious TM computation). Lastly, move 
the blank area in Tape 1 so that the invariant that it resides S,7m away from h® is 
maintained (for this, use a designated area in Tape 2). 


See an illustration in Figure 2. 


5.3 From k Tapes to k Tapes (for k > 2) 


This extension is done by making two changes. First we use k tapes to simulate the computation 
of the original k-tape machine (instead of just a single tape). Second, we use the algorithm for 
estimating the head’s position which works for k tape machines (Theorem 3.1)—this incurs an 
overhead of k operations per step. Recall that this algorithm just runs the algorithm for estimating 
the head’s position of a single tape k times (independently). It remains to explain where we execute 
this algorithm and also where we execute the oblivious Turing machine since now we do not have 
an extra work tape. 

More precisely, we modify each tape to have two “blank areas”, each as above. The first one 
will act as the “Main Tape” in the above construction and the second one acts as the “Secondary 
Tape” for another tape. Concretely, tape (+1) mod k acts as the “Secondary Tape” of tape i (for 
alli € [k]). That is, the first blank area of the head of tape i, is used to simulate the computation of 
tape i in the original machine and also to execute tape oT M2 of the oblivious Turing machine when 
simulating tape i (this is exactly the same usage of the blank area as above). The second blank area 
of tape i consists of three areas: (1) an area used to maintain the blank areas in tape (i+ 1) mod k, 
(2) an area used for the computation of the head’s location estimation of tape (i + 1) mod k, and 
(3) tape oTM; of the oblivious Turing machine when simulating tape (i + 1) mod k. 

See an illustration in Figure 3. 


6 Lower Bound 


In this section we prove that our differentially oblivious Turing machine is optimal in terms of 
overhead in a natural range of parameters. Specifically, we prove the following theorem. 
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Figure 3: An illustration of the differentially oblivious TM tape configuration for compiling a 
k-tape TM into a k-tape machine. 


Theorem 6.1. There exists an algorithmic task for which there is a Turing machine that on input 
of size N completes it in O(N) steps. On the other hand, for any 0 < s < VN,e>0,0<6<1, 
and0 < < B-(e/s)-e~2°*, any (€,5)-differentially oblivious implementation (even on a RAM and 
in the balls and bins model) for this task must consume Q(N -log s) steps with probability 1 — 2. 


Proof. In the work of Chan et al. [CCMS19] the following theorem concerning the required over- 
head to stably sort a set of balls according to associated 1-bit keys while maintaining differential 
obliviousness. Here, we assume that the balls are opaque and so no non-trivial encoding on them 
can be done [BN16]. 


Theorem 6.2 (Theorem 4.7 in [CCMS19]). Let 0 < s < VN. Suppose e > 0,0 < B <1, and 
0<6< B-(e/s)-e-**8. Then, any (even randomized) stable sorting algorithm for balls according to 
associated 1-bit keys in the RAM model that is (€,0)-differentially oblivious must have some input, 
on which it incurs at least Q(N - logs) memory accesses with probability at least 1 — p. 


The task of stably sorting N balls according to associated 1-bit keys can be implemented using 
a Turing machine in O(N) steps. Consider an input of the form (kj,v1),...,(kn,un), where 
ki € {0,1} is a 1-bit key and v; is the ith ball. The idea is to scan the input from the beginning and 
whenever we see an element (k;,v;) we do one of the following. If k; = 0, we write (ki, vi) to the 
next position in the output tape. If k; = 1, we write it to the next position in the work tape. After 
we finish scanning the input, we scan the output again and write all elements from first to last into 
the output tape. It is immediate that this algorithm is correct and has O(N) running time. | 
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A Sampling Noise on a Turing machine 


One of the operations our differentially oblivious Turing machine needs to do is to sample from the 
(continuous) Laplacian distribution Lap(x); this is used in the algorithm for estimating the head’s 
location; see Section 3. There, we need to generate a sample from Lap((1 + logy N)/e) and we 
need to do this about N times. Recall that a Laplacian distribution is unbounded and samples 
need infinite precision. We show that with small tolerable loss in precision (which does not affect 
our final result), one can sample an approximation from this distribution on a standard Turing 
machine. 

We assume that In(1/e) is an integer so that we do not have rounding issues. Also, recall that 
6 is a negligible function of the form exp(— log? N)). First, we switch to a bounded version of the 
distribution, chopping off the tail which contains elements that occur with negligible probability. We 
can assume that we sample from the range +(log(NV)/e)- poly log(1/6). Let us call ôo the probability 
mass that we chopped off. Sampling from the bounded version turns our (e, 6’)-differentially private 
prefix sum algorithm into an (e, 6)-differentially private one, where 6 = 6’+ N - (e©-69 + ôo). To see 
this, observe that we have essentially N instances of the Lap noise and so by a simple union bound, 
the statistical distance between each event w.r.t the bounded distribution happens with probability 
at most N - ôo larger than in the unbounded version. Namely, for any set S, Pr|Xbounded € S] < 
Pr[X € S]+N-60, where Xpounded is the output of the mechanism when using bounded noise and X is 
the original mechanism. Then, by differential privacy, Pr| X € S]+N -ôo < ef Pr[Y € S]+0’+ Noo, 
where Y is another arbitrary event sampled from the unbounded noise version. Then again by 
bounding the noise used in Y, we get 


Pr[X pounded € S] < e‘ (Pr[Ybounded = S] F N¢o) T Oy + Noo. 


Since we think of ôo as being negligible in N and e being a constant, ô is also negligible.* 


“The above analysis was very loose. In particular, one can do a tighter analysis and not lose the linear-in-N factor 
in 6 but for our purposes it does not matter since ô is negligible in N anyway. 
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The next step is to represent each element in the bounded range with finite precision. We want 
to lose at most 6 factor in precision (which will add another additive 6 factor to our additive error), 
and so if we use £ bits of precision, we have the inequality: 


274. ((log N) /e) - poly log(1/5) < ô. 


This means that it is enough to use £ € O(log((log N - log log(1/6))/(€d)) bits of precision which 
can be bounded by O(log?(1/5)) bits since ô is negligible in N and € is a constant. Therefore, all 
operations can be executed efficiently enough (in time O(poly log N)), which by slightly changing 
parameters (e.g., the value of p), does not affect our asymptotic upper bound on the running time 
of our differentially private Turing machine. 
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